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A graphics accelerator is disclosed that 
achieves high performance at a relatively low 
cost by overcoming the variety of system con- 
straints. The graphics accelerator comprises a 
command preprocessor for translating differing 
geometry input data formats into a standard 
format, a set of floating-point processors 
optimized for three dimensional graphics func- 
tions, and a set of draw processors that concur- 
rently perform edgewalking and scan 
interpolation rendering functions for separate 
portions of a triangle. 
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BACKGROUND OF THE INVENTION 

1. FIELD OF THE INVENTION: 

This invention relates to the field of computer 
graphics systems. More particularly, this invention re- 
lates to an architecture for a high performance three 
dimensional graphics accelerator in a computer sys- 
tem. 

2. ART BACKGROUND: 

A three dimensional graphics accelerator is a 
specialized graphics rendering subsystem for a com- 
puter system. Typically, an application program exe- 
cuting on a host processor of the computer system 
generates three dimensional geometry input data 
that defines three dimensional graphics elements for 
display on a display device. The application program 
typically transfers the geometry input data from the 
host processor to the graphics accelerator. There- 
after, the graphics accelerator renders the corre- 
sponding graphics elements on the display device. 

The design architecture of a high performance 
three dimensional graphics system historically em- 
bodies a balance between system performance and 
system cost The typical design goal is to increase 
system performance while minimizing increases in 
system cost However, prior graphics systems usually 
suffer from either limited performance or high cost 
due to a variety of system constraints. 

For example, a high performance graphics sys- 
tem typically implements an interleaved frame buffer 
comprised of multiple VRAM banks because the mini- 
mum read-modify-write cycle time for commercially 
available video random access memory (VRAM) 
chips is a fundamental constraint on rendering perfor- 
mance. The implementation of multiple interleaved 
VRAM banks enables parallel pixel rendering into the 
frame buffer to increase overall rendering perfor- 
mance. Unfortunately, the separate addressing logic 
required for each interleave VRAM bank increases 
the cost and power consumption of such high perfor- 
mance systems. 

On the other hand, a graphics system may imple- 
ment a rendering processor on a single integrated cir- 
cuit chip to minimize cost and power consumption. 
Unfortunately, such systems suffer from poor render- 
ing performance due to the limited number of inter- 
face pins available with the single integrated circuit 
chip. The limited number of interface pins reduces the 
interleave factor for the frame buffer, thereby preclud- 
ing the rendering performance benefits of parallel 
processing. 

Another graphics system constraint is the prolif- 
eration of differing three dimensional geometry input 
data formats that define similar drawing functions. A 
graphics systems is typically required to support 



many of the differing geometry input data formats. 
Some prior graphics systems support the differing ge- 
ometry formats in graphics processor micro-code. 
However, such a solution greatly increases the size 
5 and complexity of the graphics processor micro-code, 
thereby increasing system cost and decreasing sys- 
tem performance. Other prior graphics systems sup- 
port the differing geometry formats by employing a 
host processor to translate the differing formats into 

10 a standard format for the graphics processor. Unfor- 
tunately, such format translation by the host proces- 
sor creates a system bottleneck that may severely 
limit overall graphics system performance. 

In addition, prior graphics systems often perform 

15 transformation, clip test, face determination, lighting, 
clipping, screen space conversion, and setup func- 
tions using commercially available digital signal proc- 
essing (DSP) chips. However, such DSP chips are 
typically not optimized for three dimensional comput- 

20 er graphics. The internal registers provided in a typi- 
cal DSP chip are too few in number to accommodate 
the inner loops of most three dimensional graphics 
processing algorithms. In such systems, on-chip data 
caches or SRAMs are typically employed to compen- 

25 sate for the limited number of internal fast registers 
provided by the DSP chip. However, such on-chip 
data caches are usually implement scheduling algo- 
rithms that are not controllable. Moreover, such on- 
chip SRAMs are usually not suitable for a multi-proc- 

30 essing environment 

Also, DSP chips typically require an assortment 
of support chips to function in a multi-processing en- 
vironment. Unfortunately, the addition of the support 
chips to a graphics system increases printed circuit 

35 board area, increases system power consumption, in- 
creases heat generation, and increases system cost. 

Prior graphics systems often employ a parallel 
processing pipeline to increase graphics processing 
performance. For example, the scan conversion 

40 function for a shaded triangle in a graphics system is 
typically performed by a linear pipeline of edge walk- 
ing and scan interpolation. Typically in such systems, 
the edgewalking function is performed by an edge- 
walking processor, and the scan interpolation f uno- 

45 tion is performed by a set of parallel scan interpolation 
processors that receive parameters from the edge- 
walking processor. 

However, such systems fail to obtain parallel 
processing speed benefits when rendering relatively 

so long thin triangles, which are commonly encountered 
in tessellated geometry. The parameter data flow be- 
tween the edgewalking processor and the scan inter- 
polation processors greatly increases when perform- 
ing scan conversion on long thin triangles. Unfortu- 

55 nately, the increased parameter data flow slows trian- 
gle rendering and reduces graphics system perfor- 
mance. 

As will be described, the present invention is a 
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graphics accelerator that achieves high performance 
at a relatively low cost by overcoming the variety of 
system constraints discussed above. The present 
graphics accelerator comprises a command prepro- 
cessor for translating the differing geometry input 
data formats, a set of floating-point processors opti- 
mized for three dimensional graphics functions, and 
a set of draw processors that concurrently perform 
edgewalking and scan interpolation rendering func- 
tions for separate portions of a geometry object 

SUMMARY OF THE INVENTION 

A high performance three dimensional graphics 
accelerator in a computer system is disclosed. The 
graphics accelerator has a command preprocessor 
for translating geometry input data from differing for- 
mats. The command preprocessor implements both 
a 3D geometry pipeline and a direct data pipeline. The 
3D geometry pipeline of the command preprocessor 
accesses an Input vertex packet over a host bus using 
ether progr a mmed input/output or direct memory ac- 
cess. The command preprocessor reformats the input 
vertex packet into a reformatted vertex packet, and 
then transfers the reformatted vertex packet to an 
available floating-point processor over a floating- 
point bus as an output geometry packet with optional 
data substitutions and data compression. 

A set of four floating-point processors are cou- 
pled for communication over the floating-point bus. 
The first available floating-point processor receives 
the output geometry packet over the floating-point 
bus, and generates an draw packet containing para- 
meters for a screen space geometry object The float- 
ing-point processor transfers the draw packet to a set 
of five draw processors over a draw bus. Each float- 
ing-point processor implements specialized features 
and instructions for performing three dimensional 
graphics functions. 

The command preprocessor controls transfer of 
output geometry packets into the floating-point proc- 
essors, and the flow of draw packets to the draw proc- 
essors. 

The five draw processors concurrently receive 
each draw packet over the draw bus. Each draw proc- 
essor performs edgewalking and scan interpolation 
functions to render the three dimensional triangle de- 
fined by the draw packet Each draw processor ren- 
ders every fifth pixel on a scan line, such that the five 
draw processors taken together render the entire tri- 
angle. Each draw processor renders pixels into a sep- 
arate interleave bank of a five bank interleaved frame 
buffer. In addition, each draw processor receives and 
processes direct port data from the direct port pipe- 
line of the command preprocessor. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a computer system 
including a host processor, a memory subsystem, a 
5 graphics accelerator, and a display device. 

Figure 2 is a block diagram of the graphics accel- 
erator, which is comprised of a command preproces- 
sor, a set of floating-point processors, a set of draw 
processors, a frame buffer, a post- processor, and a 
10 random access memory/digital-to-analog converter 
(RAMDAC). 

Figure 3 is a block diagram of the command pre- 
processor which shows the reformatting circuitry of 
the 3D geometry pipeline, along with the direct port 
15 data pipeline. 

Figure 4 is a block diagram of a floating-point 
processor section, including a control store (CS), an 
input circuit, an output circuit a register file, a set of 
functional units, a control circuit and an SRAM inter- 
20 face circuit 

Figure 5 is a block diagram of the draw proces- 
sor, which is comprised of an input buffer, a rendering 
circuit and a memory control circuit 

25 DETAILED DESCRIPTION OF THE INVENTION 

An architecture for a high performance three di- 
mensional graphics accelerator in a computer system 
is disclosed. In the following description for purposes 

30 of explanation specific applications, numbers, appa- 
ratus, configurations and circuits are set forth in order 
to provide a thorough understanding of the present in- 
vention. However, it will be apparent to one skilled in 
the art that the present invention may be practiced 

36 without these specific details. In other instances well 
known systems are shown in diagrammatical or block 
diagram form in order not to obscure the present in- 
vention unnecessarily. 

Referring now to Figure 1, a block diagram of a 

40 computer system is shown, including a host proces- 
sor 20, a memory subsystem 22, a graphics acceler- 
ator 24, and a display device 26. The host processor 
20, the memory subsystem 22, and the graphics ac- 
celerator 24 are each coupled for communication over 

45 a host bus 28. 

The display device 26 represents a wide variety 
of raster display monitors. The host processor 20 rep- 
resents a wide variety of computer processors, mul- 
tiprocessors and CPUs, and the memory subsystem 

so 22 represents a wide variety of memory subsystems 
including random access memories and mass stor- 
age devices. The host bus 28 represents a wide va- 
riety of communication or host computer busses for 
communication between host processors, CPUs, and 

55 memory subsystems, as well as specialized subsys- 
tems. 

The host processor 20 transfers information to 
and from the graphics accelerator 24 according to a 
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programmed input/output (I/O) protocol over the host 
bus 28. Also, the graphics accelerator 24 accesses 
the memory subsystem 22 according to a direct mem- 
ory access (DMA) protocol. 

A graphics application program executing on the 5 
host processor 20 generates geometry data arrays 
containing three dimensional geometry information 
that define an image for display on the display device 
26. The host processor 20 transfers the geometry 
data arrays to the memory subsystem 22. Thereafter, 10 
the graphics accelerator 24 reads in geometry data 
arrays using DMA access cycles over the host bus 28. 
Alternatively, the host processor 20 transfers the ge- 
ometry data arrays to the graphics accelerator 24 
with programmed I/O over the host bus 28. is 

The three dimensional geometry information in 
the geometry data arrays comprises a stream of input 
vertex packets containing vertex coordinates (vertic- 
es), vertex position, and other information that de- 
fines triangles, vectors and points in a three dimen- 20 
sional space which is commonly referred to as model 
space. Each input vertex packet may contain any 
combination of three dimensional vertex Information, 
including vertex normal, vertex color, facet normal, 
facet color, texture map coordinates, pick-id's, head- 25 
ers and other information. 

A headerless input vertex packet may define a tri- 
angle strip in the form of a n zig zag" pattern of adja- 
cent triangles. A headerless input vertex packet may 
also define a triangle strip in the form of a "star strip** 30 
pattern of triangles. In addition, a headerless input 
vertex packet may define a strip of isolated triangles. 
An input vertex packet having a header may change 
triangle strip formats for each triangle and change be- 
tween "zig zag" format, "star" format, and isolated tri- 35 
angles. 

Figure 2 is a block diagram of the graphics accel- 
erator 24. The graphics accelerator 24 is comprised 
of a command preprocessor 30, a set of floating-point 
processors 40-43, a set of draw processors 50-54, a 40 
frame buffer 100, a post-processor 70 and a random 
access memory/digital-to-analog converter (RAM- 
DAC) 72. The RAM D AC 72 is similar to commercially 
available RAMDACs that implement look-up table 
functions. 45 

For one embodiment, the command preproces- 
sor 30, the floating-point processors 40-43, the draw 
processors 50-54, and the post-processor 70 are 
each individual integrated circuit chips. 

The command preprocessor 30 is coupled for so 
communication over the host bus 28. The command 
preprocessor 30 performs DMA reads of the geome- 
try data arrays from the memory subsystem 22 over 
the host bus 28. The host processor 20 transfers vir- 
tual memory pointers to the command preprocessor 55 
30. The virtual memory pointers point to the geome- 
try data arrays in the memory subsystem 22. The 
command preprocessor 30 converts the virtual mem- 



ory pointers to physical memory addresses for per- 
forming the DMA reads to the memory subsystem 22 
without intervention from the host processor 20. 

The command preprocessor 30 implements two 
data pipelines; a 3D geometry pipeline, and a direct 
port pipeline. 

In the direct port pipeline, the command prepro- 
cessor 30 receives direct port data over the host bus 
28, and transfers the direct port data over a com- 
mand-to-draw bus (CD-BUS) 80 to the draw proces- 
sors 50-54. The direct port data is optionally process- 
ed by the command preprocessor 30 to perform X11 
functions such as character writes, screen scrolls, 
and block moves in concert with the draw processors 
50-54. The direct port data may also include register 
writes to the draw processors 50-54, and individual 
pixel writes to the frame buffer 100. 

In the 3D geometry pipeline, the command pre- 
processor 30 accesses a stream of input vertex pack- 
ets from the geometry data arrays, reorders the infor- 
mation contained within the input vertex packets, and 
optionally deletes information in the input vertex 
packets. The command preprocessor 30 reorders the 
information from the input vertex packet into refor- 
matted vertex packets having a standardized ele- 
ment order. The command preprocessor 30 then 
transfers output geometry packets over a command- 
to-floating-point bus (CF-BUS) 82 to one of the f loat- 
ing-point processors 40-43. The output geometry 
packets comprise the reformatted vertex packets 
with optional modifications and data substitutions. 

The command preprocessor 30 converts the in- 
formation in each input vertex packet from differing 
number formats into the 32 bit IEEE floating-point 
number format The command preprocessor 30 con- 
verts 8 bit fixed-point numbers, 16 bit fixed-point 
numbers, and 32 bit or 64 bit IEEE floating-point num- 
bers. 

The command preprocessor 30 either reformats 
or inserts header fields, inserts constants, and gen- 
erates and inserts sequential pick-id's, and optionally 
inserts constant sequential pick-id's. The command 
preprocessor 30 examines the chaining bits of the 
header and reassembles the information from the in- 
put vertex packets into the reformatted vertex pack- 
ets containing completely isolated geometry primi- 
tives including points, lines and triangles. 

The command preprocessor 30 receives control 
and status signals from the floating-point processors 
40-43 over a control portion of the CF_BUS 82. The 
control and status signals indicate the availability of 
input buffers within the floating-point processors 40- 
43 for receiving the output geometry packets. 

The floating-point processors 40-43 are each 
substantially similar. Each floating-point processor 
40-43 implements a 32 bit micro-code driven floating- 
point core, along with parallel input and output packet 
communication hardware. Each of the floating-point 
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processors 40-43 implements floating-point func- 
tions including multiply, ALU, reciprocal, reciprocal- 
square-root and integer operations. Each floating- 
point processor 40-43 implements a wide assortment 
of specialized graphics instructions and features. 
Each floating-point processor 40-43 is optimized to 
implement the number of fast internal registers re- 
quired to perform the largest common three dimen- 
sional graphics processing micro-code inner loop im- 
plemented by the graphics accelerator 24. 

For one embodiment, each floating-point proces- 
sor 40-43 is implemented on a single integrated circuit 
chip. The only support chips required for each float- 
ing-point processor 40-43 is a set of four external 
SRAM chips that provide an external micro-code in a 
control store (OS). 

Each floating-point processor 40-43 implements 
a function for setting up triangles for scan conversion 
by the draw processors 50-54. The first step of the 
setup function sorts the three vertices of a triangle in 
ascending y order. Each floating-point processors 40- 
43 broadcasts draw packets to all of the draw proc- 
essors 50-54 over the CD-BUS 80. The draw packets 
comprises final geometry primitives, including trian- 
gles, points and lines. 

The draw processors 50-54 function as VRAM 
control chips for the frame buffer 100. The draw proc- 
essors 50-54 concurrently render an image into the 
frame buffer 100 according to an draw packet re- 
ceived from one of the floating-point processors 40- 
43 or according to a direct port packet received from 
the command preprocessor 30. 

Each draw processor 50-54 performs the scan 
conversion functions of edgewalking function and 
scan interpolation. The replication of the edgewalking 
and scan interpolation functions among the draw 
processors 50-54 obviates the need for large scale 
communication pathways between separate edge- 
walking and scan interpolation processors, thereby 
minimizing the pin counts of each of the draw proc- 
essors 50-54 and decreasing printed circuit board 
space requirements. 

The frame buffer 100 is arranged as a set of 5 
VRAM interleave banks. The draw processor 50 
writes pixel data into an interleave bank_0 61, the 
draw processor 51 writes pixel data into an interleave 
bank_1 62, the draw processor 52 writes pixel data 
into an interleave bank_2 63, the draw processor 53 
writes pixel data into an interleave bank_3 64, the 
draw processor 54 writes pixel data into an interleave 
bank_4 65. 

Each draw processor 50-54 renders only the pix- 
els visible within the corresponding interleave bank 
61-65. The draw processors 50-54 concurrently ren- 
der the triangle primitive defined by an draw packet 
to produce the correct combined rasterized image in 
the frame buffer 1 00. Each draw processor 50-54 ras- 
terizes every fifth pixel along each scan line of the fi- 
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nal rasterized image. Each draw processor 50-54 
starts a scan line biased by 0, 1 , 2, 3, or 4 pixel spaces 
to the right 

Each draw processor 50-54 optionally performs 
5 depth cueing. Each pixel of a triangle, vector or dot 
rendered may be depth cued within the draw proces- 
sors 50-54 without the performance penalty of prior 
graphics systems that perform depth cueing in float- 
ing-point processors. Each draw processor 50-54 op- 
to tionally performs rectangular window clipping, blend- 
ing and other pixel processing functions. 

The post-processor 70 receives interleaved pixel 
data from the frame buffer 1 00 over the video bus 84. 
The post-processor 70 performs color look-up table 
15 and cursor functions. The RAM D AC 72 converts the 
pixel data received from the post-processor 70 into 
video signals 73 for the display device 26. 

Figure 3 is a block diagram of the command pre- 
processor 30. The command preprocessor 30 is 
20 shown coupled to the host bus 28 for communication 
through the 3D geometry pipeline and the direct port 
pipeline. For one embodiment, the command prepro- 
cessor 30 is implemented on a single integrated cir- 
cuit chip. 

25 The direct port pipeline comprises an input Inter- 

face 541 and an X11 operations circuit 551: The input 
interface 541 receives direct port data over the host 
bus 28, and transfers the direct port data over the CD- 
BUS 80 to the draw processors 50-54. The direct port 

30 data includes register writes to the draw processors 
50-54 and individual pixel writes to the frame buffer 
100. The direct port data is optionally transferred to 
the X11 operations circuit 551 to perform X11 func- 
tions such as character writes, screen scrolls, and 

36 block moves in concert with the draw processors 50- 
54. 

The 3D geometry pipeline comprises the input in- 
terface 541, a bucket buffer 542, a format converter 
543, a vertex buffer comprising a set of vertex regis- 

40 ters 549 and alternate tupple registers 540. Format 
conversion in the 3D geometry pipeline is controlled 
by a VCS operations circuit 545 and a converter se- 
quencer 544. Output geometry packets are assem- 
bled by a primitive assembly circuit 547 and a se- 

45 quencer 548. A 32-1 6 circuit 550 optionally performs 
data compression. A set of internal registers 552 are 
programmed over the host bus 28 to control the op- 
erations of the 3D geometry pipeline and the direct 
port pipeline. A DMA controller 546 performs DMA 

so transfers into the bucket buffer 542 over the host bus 
28. 

The input interface 541 contains a burst buffer for 
interfacing between the differing docking environ- 
ments of the host bus 28 and the command prepro- 
55 cessor 30. The burst buffer functions as a set of tem- 
porary holding registers for input vertex packets 
transferred into the bucket buffer 542. 

The format converter circuit 543 accesses the in- 
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put vertex packets from the bucket buffer 542, and 
assembles the reformatted vertex packets into the 
vertex registers 549. The format converter circuit 543 
is controlled by the VCS operations circuit 545 ac- 
cording to preprogrammed format conversion opera- 
tions. The format conversion is sequenced by the 
converter sequencer 544. 

The primitive assembly circuit 547 under control 
of the sequencer 548 accesses the reformatted ver- 
tex packets from the vertex registers 549, and trans- 
fers the output geometry packets over the CF-BUS 
82. The primitive assembly circuit 547 optionally sub- 
stitutes alternate tupples from the alternate tupple 
registers 540. The primitive assembly circuit 547 also 
optionally performs data compression on data in the 
output geometry packets using the 32-16 circuit 550. 

The format converter 543 processes input vertex 
packets that define a triangle strip. Header bits in 
each input vertex packet specify a replacement type. 
The replacement type defines the combination of a 
subsequent Input vertex packet with previous Input 
vertex pacfcets to form a next triangle in the triangle 
strip. The format converter 543 implements a register 
stack that holds the last three vertices in the triangle 
strip. The format converter 543 labels the last three 
vertices in the Inangle strip as the oldest, the mid- 
dftest. and the newest. 

A triangle strip with a "zig-zag" pattern corre- 
sponds to a new input vertex packet having a header 
that specifies the replacement type replacejofdest . 
The replacement type replace joldest causes the for- 
mat converter 543 to replace the oldest vertex by the 
middlest, and to replace the middlest vertex by the 
newest and to set the newest vertex to the vertex in 
the new input vertex packet. The foregoing pattern 
corresponds to a PHIGSJPLUS simple triangle strip. 

A triangle strip with a "star" pattern corresponds 
to a new input vertex packet having a header that spe- 
cifies the replacement type replace_middlesL The re- 
placement type replace_jniddlest causes the format 
converter 543 to leave the oldest vertex unchanged, 
to replace the middlest vertex by the newest vertex, 
and to set the newest vertex to the vertex in the new 
input vertex packet. 

To begin a generalized triangle strip, a new input 
vertex packet has a h eader that specif ies the replace- 
ment type restart. The replacement type restart caus- 
es the format converter 543 to mark the oldest and the 
middlest vertices as invalid, and to set the newest ver- 
tex to the vertex in the new input vertex packet. 

The primitive assembly circuit 547 transfers an 
output geometry packet for a triangle from the vertex 
registers 549 and alternate tupple registers 540 over 
the CF-BUS 82 whenever a replacement operation 
generates three valid vertices in the vertex registers 
549. 

The restart replacement type in the header of a 
input vertex packet corresponds to a move operation 
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for polylines. The restart replacement type enables a 
single data structure, the geometry data array in the 
memory subsystem 22, to specify multiple uncon- 
nected variable length triangle strips. Such a capabil- 

5 ity reduces the overhead required for starting a DMA 
sequence over the host bus 28. 

The replacement types in the input vertex pack- 
ets received by the command preprocessor 30 from 
the geometry data array in the memory subsystem 

10 enables a triangle strip to change from a "zig zag" 
pattern to a "star" pattern in the middle of the strip. 
Such a capability enables the representation of com- 
plex geometry in a compact data structure while re- 
quiring minimal input data bandwidth over the host 

15 bus 28. 

The format converter 543 rearranges the vertex 
order in the vertex registers 549 after every re- 
place_oldest replacement type to normalize the fac- 
ing of the output triangles in the reformatted vertex 

20 packets. The primitive assembly circuit 547 rearrang- 
es the vertex order as the vertex is transferred out of 
the vertex registers 549 such that the front face of the 
output triangle is always defined by a clockwise ver- 
tex order. 

25 A header bit in a input vertex packet specifies an 

initial face ordering of each triangle strip. In addition, 
the command preprocessor 30 contains a register 
with a state bit which causes reversal of the initial face 
ordering specified in the header. An application pro- 

30 gram executing on the host processor 20 maintains 
the state bit to reflect a model matrix maintained by 
the application program. Also, the command prepro- 
cessor 30 reverses the face ordering for every trian- 
gle in a "zig-zag" pattern. 

35 The primitive assembly circuit 547 transfers each 

reformatted vertex packet from the vertex registers 
549 to a next available floating-point processor 40- 
43. The next available floating-point processor 40-43 
is determined by sensing input buffer status of each 

40 floating-point processor 40-43 over a control portion 
of the CF-BUS 82. 

The command preprocessor 30 maintains a re- 
cord or "scoreboard" of the ordering of transfer of 
each reformatted vertex packet to the floating-point 

45 processors 40-43. The command preprocessor 30 
controls the output buffers of the floating-point proc- 
essors 40-43 by transferring control signals over a 
control portion of the CD-BUS 80. The command pre- 
processor 30 ensures that the reformatted vertex 

so packets are processed through the floating-point 
processors 40-43 in the proper order when a sequen- 
tial rendering order is required. If sequential rendering 
is not required, then the first draw packet at the output 
of the floating-point processors 40-43 is rendered 

55 first. 

The format converter 543 also reformats poly- 
lines and poly-polylines. In addition, the format con- 
verter 543 optionally converts triangle strip data into 

6 
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polyline edges. Such a capability reduces the com- 
plexity of the micro-code for the floating-point proces- 
sors 40-43 because triangle processing is not mixed 
with line processing during operations that require tri- 
angle edge highlighting. 

To process edge highlighting of triangles within a 
triangle strip, the command preprocessor 30 assem- 
bles the input vertex packets for the triangle strip into 
reformatted vertex packets, and passes the reformat- 
ted vertex packets to the floating-point processors 
40-43 over the CF-BUS 82 as output geometry pack- 
ets, thereafter, the command preprocessor 30 ac- 
cesses the original triangle strip input vertex packets 
over the host bus 28, and assembles the input vertex 
packets into reformatted vertex packets containing 
isolated vectors representing highlighted edges. The 
command preprocessor 30 then processes the isolat- 
ed vectors through the floating-point processors 40- 
43 and the draw processors 50-54 to perform the 
highlighting function. 

For one embodiment the data portion of the CF- 
BUS 82 is 16 bits wide, and the data portion of the 
CD-BUS 80 is 16 bits wide. The command preproces- 
sor 30 optionally compresses color and normal data 
components of the reformatted vertex packets using 
the 32-16 circuit 550 before transfer to the floating- 
point processors 40-43 over the CF-BUS 82. The 32- 
16 circuit 550 compresses the color and normal data 
from 32 bit IEEE floating-point format into 16 fixed- 
point format Thereafter, the floating-point proces- 
sors 40-43 receive the reformatted vertex packets 
with the compressed color and normal data compo- 
nents, and decompress the color and normal compo- 
nents back into 32 bit IEEE floating-point values. 

The compression of color and normal data com- 
ponents of the reformatted vertex packets does not 
substantially affect the ultimate image quality for the 
graphics accelerator 24 because the color compo- 
nents of the reformatted vertex packets are repre- 
sented as eight bit' values in the frame buffer 100. 
Similarly, normal components of the reformatted ver- 
tex packets having a 16 bit unsigned accuracy repre- 
sent a resolution of approximately plus or minus one 
inch at one mile. On the other hand, the data com- 
pression of color and normal components of the refor- 
matted vertex packets reduces the data transfer 
bandwidth over the CF-BUS 82 by approximately 25 
percent. 

Figure 4 is a block diagram of the floating-point 
processor section 45, which includes the floating- 
point processor 40 and a control store (CS) 149. The 
floating-point processor 40 is comprised of an input 
circuit 1 41 , an output circuit 145, a register file 142, a 
set of functional units 143, a control circuit 144, and 
a SRAM interface circuit 146. The floating-point proc- 
essor 40 implements an internal subroutine stack and 
block load/store instructions for transfers to the CS 
149, as weD as integer functions. 
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The floating-point processor 40 receives the out- 
put geometry packets over a data portion 181 of the 
CF-BUS 82. The command preprocessor 30 transfers 
control signals over a control portion 182 of the CF- 

5 BUS 82 to enable and disable the input buffer 141. 

The function units 143 implement a floating-point 
multiplier, a floating-point ALU, a floating-point reci- 
procal operation, a reciprocal square-root operation, 
and an integer ALU. The output circuit 145 transfers 

10 draw packets over a data portion 1 83 of the CD-BUS 
80. The output circuit 145 also transfers control sig- 
nals over a control portion 184 of the CD-BUS 80 to 
synchronize data transfer to the draw processors 50- 
54 and to coordinate bus activity on the CD-BUS 80 

15 with the command preprocessor 30. 

For one embodiment, the input circuit 141 and the 
output circuit 145 each contain 64 registers for buffer- 
ing geometry data. The register file 142 is comprised 
of one hundred and sixty 32 bit registers. 

20 The SRAM interface 146 communicates with a 

control store (CS) 149 over a control store address 
bus 147 in a control store data bus 148. For one em- 
bodiment the control store address bus 147 Is 17 bits 
wide and the control store data bus 148 is 32 bits 

25 wide. The control store 149 is comprised of four 128k 
by eight bit SRAMs. 

The registers contained in the input circuit 141 
are arranged as a pair of 32 register files in a double 
buffered fashion. Similarly, the registers contained in 

30 the output circuit 1 45 are arranged as a pair of 32 reg- 
ister double buffered register files. The micro-code 
executing on the floating-point processor 40 access- 
es the registers of the input circuit 141 and the output 
circuit 145 as special register files. The instruction set 

35 for the floating-point processors 40 includes; com- 
mands for requesting and for relinquishing the regis- 
ter files, as well as commands for queuing for trans- 
mission completed data packets over the CD-BUS 
80. 

40 The floating-point processors 40 implements the 

triangle setup function for scan conversion by the 
draw processors 50-54. The first stage of the triangle 
setup function sorts the three vertices of a triangle in 
ascending y order. The floating-point processor 40 
45 implements a special instruction that reorders a sec- 
tion of a register file 142 in hardware based upon the 
results of the last three comparisons of the y coordin- 
ates of the vertices. 

A clip testing function implemented in the float- 
so ing-point processors 40 computes a vector of clip con- 
dition bits. The floating-point processor 40-43 imple- 
ments a special clip test instruction that computes 
pairs of the clip condition bits, while shifting the clip 
condition bits into a special clip register. After the dip 
55 condition bits have been computed, special branch 
instructions decode the clip condition bits contained 
in the clip register into the appropriate clip condition. 
The floating-point processor 40 implements separate 
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branch instructions for clipping triangles and vectors. 
The special branch instructions enable testing of mul- 
tiple clip conditions within the same instruction. 

Figure 5 is a block diagram of the draw processor 
50. The draw processor 50 is comprised of an input 
buffer 151, a rendering circuit 152, and a memory 
control circuit 153. The input buffer 151 provides a 
double buffered arrangement for receiving geometry 
data over a data portion 185 of the CD-BUS 80. The 
input buffer 151 also transfers control signals over a 
control portion 186 of the CD-BUS 80 to coordinate 
data transfer with the command preprocessor 30 and 
the floating-point processors 40-43. The input buffer 
151 is arranged such that new geometry data is load- 
ed into the input buffer 151 while old geometry data 
is being rendered by the rendering circuit 152. 

The rendering circuit 1 52 performs the edgewalk- 
ing function in one single pixel cycle time in order to 
prevent slowing of the scan conversion function. The 
high speed of the edgewalking function is provided 
because the edgewalking circuit must advance to a 
next scan line up to five times more often than would 
be required of a single external edgewalking chip. 

The rendering circuit 152 performs rasterization 
algorithms for both triangles, anti-aliased vectors, 
aliased vectors, anti-aliased dots, and aliased data. 
The memory control circuit 153 generates the ad- 
dress and control signals required to transfer pixel 
data to interleave bank_0 61 over a memory bus 1 88. 

Each draw processor 50-54 implements a high 
accuracy DDA algorithm that enables sub-pixel accu- 
racy using thirty two bit internal processing units. 
Aliased and anti-aliased lines and dots are rendered 
in the distributed manner previously described, 
wherein each draw processor 50-54 processes every 
fifth pixel along a scan line. 

Each draw processor 50-54 also implements the 
rendering portions of the X11 operations in coordina- 
tion with the X1 1 operations circuit of the command 
preprocessor 30. The X11 operations include reading 
and writing of groups of pixels for vertical scrolls, ras- 
ter operations and stencil operations. 

In the foregoing specification the invention has 
been described with reference to specific exemplary 
embodiments thereof it will, however, be evident that 
various modifications and changes may be made 
thereto without departing from the broader spirit and 
scope of the invention as set forth in the appended 
claims. The specification and drawings are accord- 
ingly to be regarded as illustrative rather than restric- 
tive. 



Claims 

1. A graphics accelerator, comprising: 

command preprocessor having a 3D ge- 
ometry pipeline and a direct port pipeline, the 



command preprocessor accessing input vertex 
packets and direct port data over the host bus, 
the 3D geometry pipeline reformatting the input 
vertex packets into reformatted vertex packets 

5 according to predefined vertex format, the 3D ge- 

ometry pipeline assembling the reformatted ver- 
tex packets into an output geometry packet and 
transferring the output geometry packet over a 
floating-point bus, the direct port pipeline trans- 

10 ferring the direct port data over a draw bus; 

at least one floating-point processor cou- 
pled to communicate over the floating-point bus, 
the floating-point processor receiving the output 
geometry packet over the floating-point bus, 

15 generating a draw packet, and transferring the 
draw packet over the draw bus, the draw packet 
containing parameters that define a geometry 
object; 

a plurality of draw processors concurrently 
20 receiving the draw packet over the draw bus, 

each draw processor performing edgewalking 
and scan interpolation functions corresponding 
to the geometry object, such that each draw proc- 
essor renders a subset of pixels corresponding to 
25 the geometry object; 

frame buffer comprising a plurality of in- 
terleave banks, each interleave bank receiving 
the subset of pixels from one of the draw proces- 
sors. 

30 

2. The graphics accelerator of claim 1, wherein the 
command preprocessor accesses the input ver- 
tex packets over the host bus according to a di- 
rect memory access protocol over the host bus. 

35 

3. The graphics accelerator of claim 2, wherein the 
command preprocessor receives virtual memory 
pointers over the host bus, the virtual memory 
pointers pointing to a geometry data array in a 

40 memory subsystem, the geometry data array 

containing the input vertex packets, the com- 
mand preprocessor translating the virtual mem- 
ory pointers into physical memory pointers for 
reading the geometry data array over the host 

45 bus. 

4. The graphics accelerator of claim 1 , wherein the 
command preprocessor accesses the input ver- 
tex packets according to a programmed in- 

50 put/output communication protocol over the host 

bus. 

5. The graphics accelerator of claim 1 , wherein the 
floating-point processor comprises a multiple en- 

55 try input buffer that receives the output geometry 

packet over the floating-point bus, the floating- 
point processor transferring a buffer status signal 
to the command preprocessor over the floatlng- 
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point bus indicating whether an entry of the input 
buffer is available. 

6. The graphics accelerator of claim 1 , wherein the 
floating-point processor comprises an output buf- s 
fer that holds the draw packet the floating-point 
processor receiving a control signal from the 
command preprocessor over the draw bus, the 
control signal causing the output buffer to trans- 
fer the draw packet over the draw bus. 10 

7. The graphics accelerator of claim 1 , wherein the 
draw processors comprise a set of five draw proc- 
essors, such that each draw processor renders 
every fifth pixel per scan line corresponding to is 
the geometry object 

8. The graphics accelerator of claim 7, wherein the 
frame buffer comprises a set of five interleaved 
video random access memory (VRAM) banks. 20 

9. The graphics accelerator of claim 8, wherein each 
draw processor comprises a memory circuit for 
accessing a separate interleave VRAM bank of 

the frame buffer. 25 

10. A method for rendering geometry objects, com- 
prising the steps of: 

accessing input vertex packets over a host 
bus, and reformatting the input vertex packets 30 
into reformatted vertex packets according to pre- 
defined vertex format; 

assembling the reformatted vertex pack- 
ets into an output geometry packet and transfer- 
ring the output geometry packet over a floating- 35 
point bus; 

receiving the reformatted vertex packet 
over the floating-point bus, and generating a 
draw packet, such that the draw packet contains 
parameters that define a geometry object; 40 

transferring the draw packet over a draw 

bus; 

receiving the draw packet over the draw 
bus, and performing edgewalking and scan inter- 
polation functions corresponding to the geome- 45 
try object such that a subset of pixels corre- 
sponding to the geometry object are rendered; 

transferring the subset of pixels to an in- 
terleave bank of an interleaved frame buffer. 

so 

11. The method of claim 10, wherein the step of ac- 
cessing input vertex packets over a host bus 
comprises the step of accessing the input vertex 
packets over the host bus according to a direct 
memory access protocol over the host bus. 55 

12. The method of claim 11, wherein the step of ac- 
cessing input vertex packets over a host bus 



comprises the steps of: 

receiving virtual memory pointers over the 
host bus, the virtual memory pointers pointing to 
a geometry data array in a memory subsystem, 
the geometry data array containing the input ver- 
tex packets; 

translating the virtual memory pointers 
into physical memory pointers; 

reading the geometry data array over the 
host bus according to the physical memory poin- 
ters. 

13. The method of claim 10, wherein the step of ac- 
cessing input vertex packets over a host bus 
comprises the step of accessing the input vertex 
packets according to a programmed input/output 
communication protocol over the host bus. 

14. The method of claim 10, wherein the step of 
transferring the output geometry packet over a 
floating-point bus comprises the steps of: 

sensing a buffer status signal over the 
floating-point bus indicating whether an entry in 
an input buffer is available; 

transferring the output geometry packet to 
the input buffer over the floating-point bus if the 
input buffer is available. 

15. The method of claim 10, wherein the step of 
transferring the draw packet over a draw bus 
comprises the steps of: 

sensing a control signal over the draw bus; 
transferring the draw packet over a draw 
bus if the control signal is sensed. 

16. The method of claim 10, wherein the subset of 
pixels comprises every fifth pixel per scan line 
corresponding to the three dimensional triangle. 

17. The method of claim 16, wherein the frame buffer 
comprises a set of five interleaved video random 
access memory (VRAM) banks. 
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@ A graphics accelerator is disclosed that 
achieves high performance at a relatively low 
cost by overcoming the variety of system con- 
straints. The graphics accelerator comprises a 
command preprocessor for translating differing 
geometry input data formats into a standard 
format, a set of floating-point processors 
optimized for three dimensional graphics func- 
tions, and a set of draw processors that concur- 
rently perform edgewalking and scan 
interpolation rendering functions for separate 
portions of a triangle. 



I 



COMMAND 



CD - BUS 



^82 



r 



FLOATING 
POINT 



FLOATING 

POINT 
PROCESSOR 



3E 



Q FLOATING -| C 
•-ifc POINT mP 
P PROCESSOR P 



FLOATINC 

POINT 
PROCESSOR 



31 



DRAW 
PROCESSOR 



DRAW 
PROCESSOR 



VRAM 
INTERLEAVE 
BANK - 0 



3D 



DRAW 

PROCESSOR 



VRAM 
INTERLEAVE 
BANK - 1 



3C 



DRAW 

PROCESSOR 



VRAM 
INTERLEAVE 
BANK- 1 



31 



VRAM 
INTERLEAVE 
DANK - 3 



3D 



VRAM 
INTERLEAVE 
BANK - 4 



POSTPROCESSOR 

31 



Figure 2 



RAMDAC 



,72 
,73 



Q. 



Jouvb, 18. rue SalntOents, 75001 PARIS 



BNSOOCI& <EP_062770QA3J_> 



i 



EP 0 627 700 A3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Nnmocr 

EP 94 30 2410 



DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 



Citation of document with i 

of rdevant passage* 



where appropriate, 



IRE WESCON CONVENTION RECORD, 
vol.31, 1987, NORTH HOLLYWOOD US 
pages 1-7, XP35624 
SUN AND CATES 'HIGH PERFORMANCE 3-D 
GRAPHICS PROCESSING* 

* the whole document * 

IEEE COMPUTER GRAPHICS AND APPLICATIONS., 

vol.9, no. 4, 1989, NEW YORK US 

pages 56 - 62, XP 11 5866 

BORDEN 'GRAPHICS PROCESSING ON A GRAPHICS 

SUPERCOMPUTER ' 

* the whole document * 



The present search report has been drawn op for all claims 



FUuorwarck 

THE HAGUE 



Drteofc 

8 December 



Relevant 
to data 



1-17 



1-17 



1994 



CATEGORY OF CITED DOCUMENTS 

X : particoJarly relevant if taken alone 

Y : particoJarly relevant tf combined with another 

eoccaaent of the sane category 
A t technological background 
O : non-written disclosure 
P : intermediate dnnaaeut 



CLASSIFICATION OF THE 
APPLICATION (IntCLS) 



G06F15/72 



TECHNICAL FIELDS 
SEARCHED (Int.Cl.5) 



G06T 



BURGAUD, C 



T : theory or principle underlying the invention 
E : earlier patent document, ant published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 



A : member of the same patent family, cotrespooding 



